Show the code
pacman::p_load(igraph, ggraph, visNetwork, tidyverse, graphlayouts, jsonlite, heatmaply, stringr, tidytext)LIANG YAO
June 17, 2023
June 10, 2023
FishEye International, a non-profit focused on countering illegal, unreported, and unregulated (IUU) fishing, has been given access to an international finance corporation’s database on fishing related companies. In the past, FishEye has determined that companies with anomalous structures are far more likely to be involved in IUU (or other “fishy” business). FishEye has transformed the database into a knowledge graph. It includes information about companies, owners, workers, and financial status. FishEye is aiming to use this graph to identify anomalies that could indicate a company is involved in IUU.
FishEye analysts have attempted to use traditional node-link visualizations and standard graph analyses, but these were found to be ineffective because the scale and detail in the data can obscure a business’s true structure. Can you help FishEye develop a new visual analytics approach to better understand fishing business anomalies?
Use visual analytics to understand patterns of groups in the knowledge graph and highlight anomalous groups.
Use visual analytics to identify anomalies in the business groups present in the knowledge graph. Limit your response to 400 words and 5 images.
Develop a visual analytics process to find similar businesses and group them. This analysis should focus on a business’s most important features and present those features clearly to the user. Limit your response to 400 words and 5 images.
Measure similarity of businesses that you group in the previous question. Express confidence in your groupings visually. Limit your response to 400 words and 4 images.
Based on your visualizations, provide evidence for or against the case that anomalous companies are involved in illegal fishing. Which business groups should FishEye investigate further? Limit your response to 600 words and 6 images.
#view(mc2[["nodes"]])
mc3_nodes <- as_tibble(mc3$nodes) %>%
mutate(country=as.character(country),
id=as.character(id),
product_services=as.character(product_services),
revenue_omu = as.numeric(as.character(revenue_omu)),
type=as.character(type)) %>%
select(id,country, type, revenue_omu, product_services)
# group_by(id,country, type, product_services) %>%
# summarise(count=n(),revenue=sum(revenue_omu))Rows: 24,036
Columns: 4
$ source <chr> "1 AS Marine sanctuary", "1 AS Marine sanctuary", "1 Ltd. Liabi…
$ target <chr> "Christina Taylor", "Debbie Sanders", "Angela Smith", "Catherin…
$ type <chr> "Company Contacts", "Beneficial Owner", "Beneficial Owner", "Co…
$ weight <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
.
# A tibble: 3,244 × 2
product_services count
<chr> <int>
1 (Italian) peeled tomatoes, legumes, vegetables, fruits and canned mush… 1
2 100 percent Spanish olives; peppers, green, black, and manzanilla stuf… 1
3 2 or 3-piece containers, twist off caps, easy opening and traditional … 1
4 8 Cement Mixer Units, Ocean Freight, Air Freight, Project Logistics, C… 1
5 A chemical science firm with a focus on the development of high purity… 1
6 A complete range of fully-vertical, Schiffli embroidery manufacturing … 1
7 A complete range of transportation and logistics services 1
8 A customs broker and freight forwarder 1
9 A distributor, importer and exporter of food products to the food reta… 1
10 A freight broker 1
# ℹ 3,234 more rows
Here need to also remove “character”,“0”,“unknown” as stop words.
# A tibble: 27,622 × 7
id country type revenue_omu product_services n_fish `"fish"`
<chr> <chr> <chr> <dbl> <chr> <int> <chr>
1 Jones LLC ZH Comp… 310612303. Automobiles 11 fish
2 Coleman, Hall and… ZH Comp… 162734684. Passenger cars,… 39 fish
3 Aqua Advancements… Oceanus Comp… 115004667. Holding firm wh… 248 fish
4 Makumba Ltd. Liab… Utopor… Comp… 90986413. Car service, ca… 428 fish
5 Taylor, Taylor an… ZH Comp… 81466667. Fully electric … 72 fish
6 Harmon, Edwards a… ZH Comp… 75070435. Discount superm… 59 fish
7 Punjab s Marine c… Riodel… Comp… 72167572. Beef, pork, chi… 652 fish
8 Assam Limited L… Utopor… Comp… 72162317. Power and Gas s… 1737 fish
9 Ianira Starfish S… Rio Is… Comp… 68832979. Light commercia… 94 fish
10 Moran, Lewis and … ZH Comp… 65592906. Automobiles, tr… 88 fish
# ℹ 27,612 more rows
# A tibble: 27,622 × 7
id country type revenue_omu product_services n_logistic
<chr> <chr> <chr> <dbl> <chr> <int>
1 Jones LLC ZH Comp… 310612303. Automobiles 11
2 Coleman, Hall and Lopez ZH Comp… 162734684. Passenger cars,… 39
3 Aqua Advancements Sash… Oceanus Comp… 115004667. Holding firm wh… 248
4 Makumba Ltd. Liability… Utopor… Comp… 90986413. Car service, ca… 428
5 Taylor, Taylor and Far… ZH Comp… 81466667. Fully electric … 72
6 Harmon, Edwards and Ba… ZH Comp… 75070435. Discount superm… 59
7 Punjab s Marine conser… Riodel… Comp… 72167572. Beef, pork, chi… 652
8 Assam Limited Liabil… Utopor… Comp… 72162317. Power and Gas s… 1737
9 Ianira Starfish Sagl I… Rio Is… Comp… 68832979. Light commercia… 94
10 Moran, Lewis and Jimen… ZH Comp… 65592906. Automobiles, tr… 88
# ℹ 27,612 more rows
# ℹ 1 more variable: `"transportation"` <chr>
Find edges filtering by those majority hscodes.
Prepare data for interactive network graph.
Build an interactive network graph for checking the position of each node.
Firstly read all 12 files provided by Fisheye into one table.
Then check number of edges by “hscode” and by “generagted_by” (here I renamed this column as “group”)
Here I will select “carp” group as the new set of links to add into mc2 graph, since from the facet nodes graph I can see this set of link got most and sparsest of nodes, indicating this set should be able to contribute the most to the original graph. At the same time, the facet network graph show not much difference between groups.